Document Expansion using a Side Collection for Monolingual and Cross-language Spoken Document Retrieval

نویسندگان

  • Yuk-Chi LI
  • Helen M. MENG
چکیده

This paper presents a method of document expansion using a side collection for improving the overall performance in retrieving spoken documents using text queries. This method is applied to Chinese spoken document retrieval (SDR) tasks where a series of experiments have been carried out for both monolingual and cross-language SDR systems. In our monolingual retrieval experiments, Cantonese broadcast news documents are retrieved using a multi-scale syllable-based approach. Experimental results show that application of document expansion can achieve an improvement of 56% in average inverse rank (AIR). For the cross-language spoken document retrieval (CL-SDR) task where Mandarin broadcast news is retrieved using English textual queries, experimental results show that the use of document expansion has brought 14% relative improvement in retrieval performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval

This paper presents the application of document expansion using a side collection to a cross-language spoken document retrieval (CL-SDR) task to improve retrieval performance. Document expansion is applied to a series of EnglishMandarin CL-SDR experiments using selected retrieval models (probabilistic belief network, vector space model, and HMM-based retrieval model). English textual queries ar...

متن کامل

CLEF-2005 CL-SR at Maryland: Document and Query Expansion using Side Collections and Thesauri

This paper reports results for the University of Maryland’s participation in CLEF-2005 Cross-Language Speech Retrieval track. Techniques that were tried include: (1) document expansion with manually created metadata (thesaurus keywords and segment summaries) from a large side collection, (2) query refinement with pseudo-relevance feedback, (3) keyword expansion with thesaurus synonyms, and (4) ...

متن کامل

Cross-Language Spoken Document Retrieval on the TREC SDR Collection

This paper presents preliminary experiments on crosslanguage spoken document retrieval (SDR) carried out on a benchmark assembled at ITC-irst. The benchmark is based on resources used in the last two spoken document retrieval tracks at the TREC conference, which are available on the Internet. They include automatic transcripts of American English broadcast news, short topics written in English,...

متن کامل

Issues in pre- and post-translation document expansion: untranslatable cognates and missegmented words

Query expansion by pseudo-relevance feedback is a well-established technique in both monoand crosslingual information retrieval, enriching and disambiguating the typically terse queries provided by searchers. Comparable document-side expansion is a relatively more recent development motivated by error-prone transcription and translation processes in spoken document and cross-language retrieval....

متن کامل

Information fusion for monolingual and cross-language spoken document retrieval

of thesis entitled: Information fusion for monolingual and cross-language spoken document retrieval Submitted by LO Wai-Kit for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in October 2002 Spoken document retrieval (SDR) is an important technique that enables relevant information to be searched from spoken data archives. With the advent of Internet and multimedia te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003